Barrier Synchronization Pattern
نویسندگان
چکیده
Parallel algorithms divide the work into multiple, concurrent tasks. These tasks or UEs may execute in parallel depending on the physical resources available. It is common for UEs to proceed in phases where the next phase cannot start until all UEs complete the previous phase. This is typically due to mutual dependency on the data written during the previous phase by concurrent UEs. Since UEs may execute at different speeds, there is a need for UEs to wait for one another before proceeding to the next phase. Barriers are commonly used to enforce such waiting. Figure 1 illustrates how a barrier works. A UE executes its code until it reaches a barrier. Then it waits until all other UEs have reached that barrier before proceeding. Consider the Barnes-Hut [BH86] N-body simulation algorithm. This is an iterative algorithm with well-defined phases: building the octree, calculating the forces between bodies, updating the positions and velocities of each body. One way to parallelize the algorithm is to have multiple UEs perform the three different phases. However, no UE can proceed to the next phase until all UEs complete executing the previous phase. After all, it does not make sense to update the position when some UEs are still calculating the forces between bodies. A barrier where all UEs wait for each other to reach the barrier before continuing with their respective computation, is called a global barrier. We distinguish a global barrier from another kind of barrier called local barrier, where a parent task waits for all the child tasks to finish before it can continue.
منابع مشابه
Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels
We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other extreme we consider kern...
متن کاملBarrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms
Little work has been done on the performance of barrier synchronization using two-phase blocking, as the common wisdom is that it is useless to spin if the total number of threads in the system exceeds the number of processors. We challenge this view and show that it may be beneficial to spin-wait if the spinning period is set to be a bit more than twice the context switch overhead (rather than...
متن کاملArea and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips
Barrier synchronization is commonly and widely used to synchronize the execution of parallel processor cores on multi-core Network-on-Chips (NoCs). Since its global nature may cause heavy serialization resulting in large performance penalty, barrier synchronization should be carefully designed to have low latency communication and to minimize overall completion time. Therefore, in the paper, we...
متن کاملASYNC Loop Constructs for Relaxed Synchronization
Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors. When implemented in OpenMP, they repeatedly execute barrier synchronization in each iterative step to ensure that data dependencies are strictly satisfied. We propose new parallel annotations to support an asynchronous computation model for iterative s...
متن کاملFast Barrier Synchronization in Wormhole k-ary n-cube Networks with Multidestination Worms1
This paper presents a new approach to implement fast barrier synchronization in wormhole k-ary n-cubes. The novelty lies in using multidestination messages instead of the traditional single destination messages. Two diierent multidestination worm types, gather and broadcasting, are introduced to implement the report and wake-up phases of barrier synchronization , respectively. Algorithms for co...
متن کاملFast Barrier Synchronization on Shared Fast Ethernet
Shared LAN is presently the most widespread networking technology, due to its extremely low cost and favourable cost/performance ratio. Clusters of Personal Computers (PCs) leveraging shared 100base-T Ethernet may currently ooer the best price/performance in parallel processing. Most numerical parallel algorithms make heavy use of collective communications and especially barrier synchronization...
متن کامل